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ABSTRACT 



Apparatus, methods and computer program products are 
disclosed for a high level language compiler that includes a 
binary re -optimization capability. This re -optimization capa- 
bility inputs a binary executable and outputs a binary 
module optimized for a target computer system. The binary 
module can be linked to create an optimized binary execut- 
able. This capability is provided by adding a front end 
segment to the compiler that reads the binary executable and 
creates an intermediate representation of the binary execut- 
able. This intermediate representation is normalized to 
remove prior optimization artifacts and to virtu alize register 
usage. The intermediate representation is then optimized for 
a target computer system resulting in a binary module that 
can be linked to make a binary executable that is optimized 
for the target computer. 

18 Claims, 7 Drawing Sheets 
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METHOD, APPARATUS AND COMPUTER optimizer assumes that the target computer contains an 

PROGRAMMED PRODUCT FOR BINARY unlimited number of registers. During the operation of the 

RE-OPTIMIZATION USING A HIGH LEVEL c ° d e generator segment 107, these virtual registers are 

LANGUAGE COMPILER assigned to the physical registers of the target computer. This 

s resource management is performed in the code generator 
BACKGROUND OF THE INVENTION segment 107 by a register allocation (expansion) process. 
p . t f fh T One aspect of the register allocation process is that the 
•rieia oi me invention contents of physical registers are often "spilled" to memory 
1ms invention relates to the field of optimizing compilers. ■ at various points during tne execution of the program so that 
Specifically, this invention is a method, apparatus and com- ^ Q limited number of physical registers can be used to hold 
puter program product for providing a high level language values of more immediate relevance to the program at those 
compiler with the capability to re-optimize a previously various points. Those values that are spilled to memory are 
compiled binary executable. often restored to the registers when the program advances to 
Background different points of execution. An example of register alio- 
FIG. 1 illustrates a prior art optimizing compiler, indi- 15 ca , tion a ° d register spilling techniques is provided in Ch- 
eated by general reference character 100, for compiling a P lle ™ P ^ nci P}fJ ech ^^ s ^^ ft v L ^ 
source program to create an optimized binary executable. Ravl p Set £ ^l^i n ^^f° n '™<21 ^ 
decompiler 100 consumes source information 101 through m * Co < * 19 f > , 10088-6, pages 541-546. 
a compiler front-end segment 103. The compiler front-end Executlon re P re ! e ^ sequence of opera- 
segment 103 processes the syntax and semantics of the 20 Uo ™ execu * d b J the >P ro S ram ' ^formation can be included 
• p + + M jj . . « , c 2U on the graph s edges to provide scheduling information such 
source information 101 accordmg to the rules of the pro- a§ d * £ i&onniioo, frequency of execution infor- 
grammmg language applicable to the source information matk £ QT ^ infonnation ma 7| s ^ for optimizing the 
101. The compiler front-end segment 103 generates at least opera tions represented by the execution flow graph, 
one version of an intermediate code representation 104 (IR) Software pipelining is a technique for scheduling the 
of the source information 101. The intermediate code rep- 25 execution of instructions. In the cage of simple basic block 
resentation generally mcludes data structures that either loopSj software pipelining schedules different overlapping 
represent, or can be used to create, data dependency graphs iterations of the loop body to exploit a computer's under- 
(DDGs) and execution flow graphs. The intermediate code lying parallel computation units. The execution schedule 
representation 104 is then optimized by an intermediate includes of a prologue, a kernel, and an epilogue. The 
representation optimizer segment 105. The intermediate 30 prologue initiates the first p iterations thus starting each 
representation optimizer segment 105 operates on, and iteration, A steady state is reached after the first p*H cycles, 
adjusts, the intermediate code representation 104 of the where II is the initiation interval where each initiated 
source information 101 to optimize the execution of a iteration is executing instructions in parallel. In this steady 
program in a variety of ways well understood in the art. The state or kernel, one iteration of the loop is completed every 
intermediate representation optimizer segment 105 gener- 35 n cycles. Once the kernel initiates the last iteration in the 
ates an optimized intermediate representation 106. A code i oopj ^ epilogue completes the last p iterations of the loop 
generator segment 107 consumes the optimized intermediate mat were initiated by the kernel. Often the instruction 
representation 106, performs low level optimizations, alio- schedule requires that a particular instruction be initiated 
cates physical registers and generates binary module 109 a ft e r some delay — thus, unfilled instruction slots in the 
(and conditionally assembler source code) from the opti- 40 instruction schedule are filled with "no-operation" (NOP) 
mized intermediate representation 106. The binary module instructions. 

comprises binary computer instructions (binary code) in a Computer manufacturers often make a family of comput- 
module that can be linked with other modules to create a ers ^ similar architectures. One problem for both corn- 
binary executable. The assembler source code is a series of puter mami facturers and computer application developers is 
symbolic statements in an assembler source language. Both 45 mc conflict between the desire of computer manufacturers to 
the assembler source code and the binary code are targeted pr0 vide more powerful computers with extended capabili- 
to a particular computer application binary interface (ABI). ties md mat of me program application developers who tend 

DDGs embody the information required for an optimizer to optimize an application to execute on the largest number 

to determine which statements are dependent on other of computers of a particular family. Although the models in 

statements. The nodes in the graph represent statements in a 50 the architecture family are similar, they often have differ- 

programmed block and arcs represent the data dependencies ences. (For example, the SPARC™ architecture includes 

between nodes. In particular, the scope of a variable extends different application binary interfaces (ABIs), numbers of 

from a "def ' of the variable to a "use" of the variable. A pipelines, and other differences between the V8, V8+, and 

"def ' corresponds to an instruction that modifies a variable V9 SPARC based models.) These differences are generally 

(an instruction "defines" a variable if the instruction writes 55 the result of cost/performance tradeoffs or new architectural 

into the variable). A use corresponds to an instruction that features added to later models. Commercial applications are 

uses the contents of the variable. For example, the instruc- generally compiled and optimized to execute using only the 

lion "x-y+1;" "defs x" and "uses y". An arc in the DDG capabilities of the architecture that are shared by each model 

extends from the def of a variable to the use of the variable. 0 f the architecture family. Thus, the application does not use 

DDGs are described in chapter 4 of Supercompilers for 60 the advanced capabilities available to the more advanced 

Parallel and Vector Computers, by Hans Zima, © 1991, models. Because application developers generally do not 

ACM press, ISBN 0-201-17560-6, 1991. provide source code, the user of the application is unable to 

As mentioned above, the code generator segment 107 optimize the application to take advantage of the additional 

performs low level optimizations and generates either (or features of the more advanced models — thus, the application 

both) binary code or assembler source code. The interme- 65 will not perform as efficiently as if it were optimized 

diate representation of the program generally references specifically for the computer model that executes the appli- 

virtual registers. That is, the intermediate representation cation. 
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Another problem is that compiler optimization generally that depends on the first ABI environment. The apparatus 

assumes that each execution flow path is equally likely to be also includes an intermediate representation optimizer 

executed during the operation of the application. This means mechanism that is configured to optimize the intermediate 

that the compiler does not optimize the execution flow path representation for the second ABI environment. In addition, 

according to how the program actually operates. However, 5 the apparatus includes a code generation mechanism that is 

applications can be instrumented to capture an execution configured to generate the second binary executable by 

profile when operating on a particular data set. This profile using the intermediate representation that is optimized for 

information could be captured, and used to optimize an the second ABI environment. 

application specifically for use with that particular data set. Another aspect of the invention is a computer program 

In addition, some computer architectures provide memory io product that includes computer readable code, embedded in 

performance information, such as cache -miss information a computer usable storage medium, for causing a computer 

for the memory caches. This information could also be used to convert, by a high level language compiler, a first execut- 

to optimize memory organization for a particular data set able binary executable, that executes within an first appli- 

and usage pattern. These optimizations could include cation binary interface (ABI) environment, to an second 

restructuring and rescheduling the code, using pre -fetch 15 binary executable, that executes within an second ABI 

instructions and non-faulting loads, if the result of a branch environment. When executed on a computer, the computer 

instruction can be predicted, and other optimizations based readable code causes a computer to effect a binary dissas- 

on how the application performs when it executes. However, embler mechanism, an intermediate representation normal- 

because application developers do not provide the applica- ization mechanism, an intermediate representation optimizer 

tion's source code for the user to compile, these optimiza- 20 mechanism and a code generation mechanism. Each of these 

tions are not available to the user of the application. mechanisms have the same functions as the corresponding 

Yet another problem is that operating system facilities mechanisms for the previously described apparatus, 

sometimes become obsolete. These obsolete services are These and other features of the invention will become 

usually retained and can be invoked by an obsolete operating apparent when the following detailed description is read in 

system service invocation, but newer more eflicient services 25 combination with the accompanying figures, 
are also provided. These newer services are invoked by 

preferred operating system service invocations. An applica- DESCRIPTION OF THE DRAWINGS 

tion compiled to use the obsolete services cannot use the pj G i illustrates a prior art compiler architecture; 

newer services. FIG. 2 illustrates a computer system capable of using the 

Thus, it would be advantageous to provide a high level invention in accordance with a preferred embodiment; 

language compiler with the capability to re-optimize a FIG 3 mustrates a high level language re-optimizing 

binary executable, originally not optimized, partially compi i er architecture in accordance with a preferred 

optimized, or optimized for a particular computer system, so embodiment; 

mat me binary executable is optimized for another targeted 4 .„ . . ,. .. . 

: r 35 FIG. 4 illustrates a binary re-optimization process in 

computer system. , ... # j w j • ♦ 

r J accordance with a preferred embodiment; 

SUMMARY OF THE INVENTION VIG. 5 illustrates an intermediate representation normal- 
ization process in accordance with a preferred embodiment; 

The present invention includes methods, apparatus and FIG. 6 illustrates a binary executable update process in 

computer program products that re-optimize a binary 40 accordance with a preferred embodiment; and 

executable for a target computer system. One aspect of the _ Tr , _ -« . , C1 , . .. 

• i t \ . || j ,t j r FIG. 7 illustrates a profiled re-optimization process in 

invention mckdes a computer controlled me hod for con- accordance ^ a J md embodimem . 

verting a first executable binary executable that executes r 

within a first application binary interface (ABI) environment DESCRIPTION OF THE PREFERRED 

to a second binary executable that executes within a second 45 EMBODIMENTS 

ABI environment. The method is performed by a high level Notations and Nomenclature 

language compiler. The method includes the step of con- The following 'notations and nomenclature' are provided 

verting the first executable binary executable to an interme- to assist in the understanding of the present invention and the 

diate representation. Then the intermediate representation is preferred embodiments thereof 

processed to remove an architectural related optimization 50 Application Binary Interface (ABI)— The ABI is a binary 

that depends on the first ABI environment. The method also standard that defines the opcodes and system services that 

includes the steps of optimizing the intermediate represen- are provided by a computer system and are available to an 

tation for the second ABI environment and of generating the application program. 

second binary executable. Architectural related optimization — An architectural 

Another aspect of the invention includes an apparatus 55 related optimization is an optimization that depends on 

having a central processing unit (CPU) and a memory particular aspects of the architecture of the computer system 

coupled to said CPU for converting, by a high level language that will execute a binary executable. Example architectural 

compiler, a first executable binary executable, that executes related optimizations include (without limitation) optimiza- 

within a first application binary interface (ABI) tions made that are dependent on the number of pipelines in 

environment, to a second binary executable, that executes 60 the computer system, the number of available registers, 

within a second ABI environment. The apparatus includes a memory cache structure, operating system services and 

binary dissasembler mechanism that is configured to convert other such facilities. 

the first executable binary executable to an intermediate Binary executable — A binary executable is the data that is 

representation. The intermediate representation, generated loaded into a computer's memory and that is executed by the 

by the binary dissasembler mechanism, is processed by an 65 computer's CPU. 

intermediate representation normalization mechanism that is Execution profile — An execution profile is a collection of 

configured to remove an architectural related optimization data, gathered while a program executes, that reveals 
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(without limitation) which procedures in the program are encoded with a program that causes a computer to perform 

most frequently executed, the execution flow of the the programmed logic, 

procedures, memory cache utilization, and other information Operating Environment 

that can be used to analyze the program's performance. The Some of the elements of a computer system, as indicated 

execution profile is often obtained by inserting instrumen- 5 by general reference character 200, configured to support the 

tation procedures within a binary executable. invention are shown in FIG. 2 wherein a processor 201 is 

Intermediate representation (IR) — The intermediate rep- shown, having a central processor unit (CPU) 203, a 

resentation is the representation of a source program that memory section 205 and an input/output (I/O) section 207. 

results after the source program has been processed by a The I/O section 207 is connected to a keyboard 209, a 

compiler's front-end segment The intermediate representa- 10 display unit 211, a disk storage unit 213 and a CD-ROM 

tion represents the structures and operations described in the drive unit 215. The CD-ROM drive unit 215 can read a 

source program but in a form that is efficiently processed io CD-ROM medium 217 that typically contains a program and 

subsequent segments of the compiler. data 219. The CD-ROM drive unit 215, along with the 

Memory-cache performance information — The memory- CD-ROM medium 217, and the disk storage unit 213 

cache performance information is information obtained 15 comprise a filestorage mechanism. One skilled in the art will 

from the computer's memory management system that pro- understand that the CD-ROM drive unit 215 can be replaced 

vides information relating to which memory accesses gen- by a floppy disk, magnetic tape unit or similar device that 

erate cache misses. This information is used to optimize accepts a removable media that can contain the program and 

memory usage. data 219. Such a computer system is an example of a system 

Pipeline scheduling artifact — A pipeline scheduling arti- 20 that is capable of executing procedures that embody the 

fact is a specific architectural related optimization that invention. 

effectuates a computer system's parallel processing capa- FIG. 3 illustrates a high level language re-optimizing 
bilities. These artifacts include (without limitation) filling compiler, indicated by general reference character 300, that 
unused pipeline slots with NOP instructions, instructions is capable of re-optimizing a binary executable. Like the 
used to effectuate a pipeline schedule, and other such 25 optimizing compiler 100 of FIG. 1, the re-optimizing corn- 
optimizations piler 300 can process source information 301 by a compiler 

Procedure — A self-consistent sequence of steps leading to front-end segment 303 that generates an intermediate rep- 

a desired result. These steps are those requiring physical resentation of the source information 301. This intermediate 

manipulation of physical quantities. Usually these quantities representation is optimized by an intermediate representa- 

take the form of electrical or magnetic signals capable of 30 tion optimizer segment 305 that performs optimizations on 

being stored, transferred, combined, compared, and other- the intermediate representation. The optimized intermediate 

wise manipulated. These signals are referred to as bits, representation is then processed by a code generator seg- 

values, elements, symbols, characters, terms, numbers, or ment 307 that generates a binary module 309 containing 

the like. It will be understood by those skilled in the art that opcodes. A linker application converts the binary module 

all of these and similar terms are associated with the 35 309 into a binary executable. The intermediate representa- 

appropriate physical quantities and are merely convenient tion optimizer segment 305 and the code generator segment 

labels applied to these quantities. 307 also propagate portions of their internally collected 

Overview symbol and alias information as annotations to the resulting 

The manipulations performed by a computer in executing binary module (and corresponding binary executable). This 

programmed instructions are often referred to in terms, such 40 annotation information is used by a binary re-optimization 

as adding or comparing, that are commonly associated with process (subsequently described with respect to FIG. 4) to 

mental operations performed by a human operator. In the approximate the compiler's internal state during the compi- 

present invention no such capability of a human operator is lation of the source code used to create the binary execut- 

necessary in any of the operations described herein. The able. 

operations are machine operations. Useful machines for 45 In addition to these components, the re-optimizing com- 
performing the operations of the invention include pro- piler 300 includes a binary dissasembler segment 311 that 
grammed general purpose digital computers or similar inputs and converts a binary executable 313 to an interme- 
devices. In all cases the method of computation is distin- diate representation. The intermediate representation gener- 
guished from the method of operation in operating a com- ated by the binary dissasembler segment 311 is then nor- 
puter. The present invention relates to method steps for 50 malized by an IR normalization segment 315 that removes 
operating a computer in processing electrical or other (e.g., selected optimization artifacts from the IR. The operation of 
mechanical, chemical) physical signals to generate other the IR normalization segment 315 is subsequently described 
desired physical signals. ■ with respect to FIG. 5. In effect . the binary dissasembler 
The invention also relates to apparatus for performing segment 311 together with the IR normalization segment 
these operations. This apparatus may be specially con- 55 315 perform the inverse operation of the operation per- 
structed for the required purposes or it may comprise a formed by the code generator segment 307. Once the inter- 
general purpose computer as selectively activated or recon- mediate representation is normalized, it is used as input to 
figured by a computer program stored in the memory of a the intermediate representation optimizer segment 305 
computer. The procedures presented herein are not inher- where it is optimized and processed by the code generator 
entry related to a particular computer or other apparatus. In 60 segment 307 to make a binary module that can be linked by 
particular, various general purpose machines may be used a linker to create a binary executable. The intermediate 
with programs written in accordance with the teachings representation optimizer segment 305 of the re-optimizing 
herein, or it may prove more convenient to construct more compiler 300 can also process profile information 317 
specialized apparatus to perform the required method steps. generated during execution of an instrumented binary 
The required structure for a variety of these machines will 65 executable to determine which portions of the binary execut- 
appear from the following description. Also, the invention able most need to be optimized. One skilled in the art will 
may be embodied in a computer readable storage medium understand that equivalent embodiments will combine the 
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functionality of the binary dissasembler segment 311 and the 
IR normalization segment 315 into one component. 

FIG. 4 illustrates a binary re-optimization process, indi- 
cated by general reference character 400, used by the 
re-optimizing compiler 300 of FIG. 3 to re-optimize a binary 
executable. The binary re -optimization process 400 initiates 
at a * start' terminal 401 and continues to a dissasembler 
procedure 403. The dissasembler procedure 403 inputs and 
converts the binary executable into a disassembled 
representation — an intermediate representation. The dissas- 
embler procedure 403 also processes any supplied annota- 
tion information to approximate the compiler's state that 
was generated during the compilation of the source. In 
particular, the dissasembler procedure 403 uses the annota- 
tion information to recreate the program's symbol table and 
alias information from the original source compilation. 
Next, the intermediate representation is normalized by an 
'IR normalization' procedure 405, as is subsequently 
described with respect to FIG. 5, to create an intermediate 
representation suitable for processing by the intermediate 
representation optimizer segment 305 of FIG. 3. Then the 
binary re-optimization process 400 conditionally (based od 
a user specified' preference or option) inputs and processes 
execution profile data at a 'process profile data' procedure 
407. This information, if provided, is used during an 'IR 
optimization' procedure 409. The 'IR optimization* proce- 
dure 409 optimizes the intermediate representation for the 
computer system (processor and operating system) that will 
execute the application. These optimizations include 
(without limitation) techniques for interprocedural 
optimization, and local, loop, and global scheduling. In 
addition, if profile data was processed during the 'process 
profile data' procedure 407, this data is used to optimize the 
application with respect to the data set and execution param- 
eters used during the creation of the profile data. Once the 
intermediate representation is optimized, a 'binary code 
generation' procedure 411 performs low level optimizations 
(for example but without limitation, register allocation, 
delay slot filling, pipeline scheduling and peephole 
optimization) and generates a binary module suitable for 
linking. The binary re-optimization process 400 completes 
through an 'end' terminal .413. Optionally, the binary 
re-optimization process 400 can log information regarding 
the optimizations made on the original binary executable 
that result in the optimized binary executable. This infor- 
mation is used for debugging purposes to help locate prob- 
lems that occur after several re-optimization iterations. One 
skilled in the art will understand that the placement of many 
of the optimizations above are a function of the compiler 
architecture and that application of tbe optimization tech- 
niques are often equivalent independent of their ordering. 

FIG. 5 illustrates an IR normalization process, indicated 
by general reference character 500, used by the 'IR normal- 
ization' procedure 405 of FIG. 4. The IR normalization 
process 500 initiates at a 'start' terminal 501 and continues 
to a 'virtu alize registers' procedure 503. The 'virtualize 
registers' procedure 503 virtualizies the registers to be 
independent of the register limitations of the original com- 
puter system and removes register spilling instructions from 
the intermediate representation. Next, an 'adjust instrumen- 
tation' procedure 505 conditionally removes, adds, or 
ignores profiling instructions in the intermediate represen- 
tation. These profiling instructions are used to gather and 
save profile data relating to the execution history and/or the 
memory cache performance of the executing program. That 
is, when the binary executable is executed by the computer, 
the profiling procedures measure relevant characterises of 
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the executing program and store this information as profile 
data. The profile data from a particular execution is used 
(either alone or in combination with other profile data) by 
the 'process profile data' procedure 407 to re-optimize the 

5 binary executable based on the execution profile. Next, an 
'unfill pipeline delay slots' procedure 507 detects filled 
pipeline delay slots and promotes the instructions used to fill 
the slot out of the schedule. The freed slots may be filled 
with NOP instructions or the schedule may be collapsed. The 

10 'unfill pipeline delay slots' procedure 507 may also include 
additional mechanisms to detect and remove other pipeline 
scheduling artifacts from the intermediate representation. 
Finally, an 'optimize system call' procedure 509 condition- 
ally (dependent on a user profile or command option) 

15 replaces obsolete operating system service invocations with 
preferred operating system service invocations. The IR 
normalization process 500 completes through an 'end* ter- 
minal 511. 

One skilled in the art will understand that the previously 

20 described techniques will optimize or re-optimize a binary 
executable that was either not optimized, or optimized for a 
different computer system (CPU and/or operating system) 
than the computer system targeted by the compilation. Thus, 
such a historical binary executable, that is not optimized for 

25 the computer system executing it, can be optimized to use 
the available features provided by that specific computer 
system executing the binary executable. FIG. 6 illustrates a 
re -optimization process, as indicated by general reference 
character 600, that re -optimizes a historical binary execut- 

30 able initially targeted for a particular computer system. The 
re-optimization process 600 initiates at a 'start' terminal 601 
and continues to a 're-optimize binary executable' procedure 
603 that uses the previously described techniques to gener- 
ate an optimized binary module based on the binary execut- 

35 able. Next, a 'save optimized binary module' procedure 605 
saves the optimized binary generated by the 're-optimize 
binary executable' procedure 603. This saved module is then 
processed by a linker application (or its equivalent) to 
generate an optimized binary executable by a 'link binary 

40 module' procedure 607. The re-optimization process 600 
completes through an 'end' terminal 609. 

FIG. 7 illustrates a profile-based re-optimization process, 
indicated by general reference character 700, used to 
re-optimize a binary executable with respect to its process- 

45 ing of a particular data set. The process 700 initiates at a 
'start' terminal 701 and continues to a 'profile execution' 
procedure 703. The 'profile execution' procedure 703 
executes an instrumented version of the binary executable 
on a specific data set. The instrumentation in the binary 

50 executable includes (without limitation) procedures that 
capture the program's execution history, processor and 
memory cache performance information, and other infor- 
mation that one skilled in the art will understand can be used 
to optimize execution of the binary executable. An 'execu- 

55 tion profile based optimization' procedure 705 analyzes the 
profile information generated by the 'profile execution' 
procedure 703 to optimize the binary executable with 
respect to that particular data set. The 'execution profile 
based optimization' procedure 705 also conditionally 

60 removes, adds or optimizes the instrumentation procedures 
on the binary executable as desired by the user. Next, a 'save 
optimized binary module' procedure 707 saves the newly 
optimized binary module. This saved module is then pro- 
cessed by a linker application (or its equivalent) to generate 

65 an optimized binary executable at a 'link binary module' 
procedure 708. An 'execute save data set specific binary 
executable on dataset' procedure 709 then executes the 
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optimized binary executable on the data set to achieve the 
optimized performance with respect to that data set. Finally, 
the process 700 completes through an 'end* terminal 711. 

One skilled in the art will understand that the previously 
described techniques will optimize a binary executable for 
use with a particular data set. Further, one skilled in the art 
will also understand that one can perform multiple optimi- 
zations on a binary executable to generate multiple binaries 
each optimized for use with a particular data set. An addi- 
tional capability allows the user to combine profiles from 
multiple data sets of interest or from multiple uses of a given 
data set to optimize the binary executable for those specific 
data sets of interest. 

From the foregoing, it will be appreciated that the inven- 
tion has (without limitation) the following advantages: 

1) The invention optimizes the binary executable for the 
target computer that will execute the binary executable 
without using the binary executable's source code. 
Thus, the invention enables more efficient operation of 
the binary executable on the target computer. 

2) The invention enables a computer user to optimize the 
binary executable dependent on the data set processed 
by the binary executable. 

3) The invention allows a binary executable to use new 
operating system services instead of older services 
when appropriate. 

4) The invention allows an application developer to 
provide an unoptimized binary executable of an appli- 
cation and for the developer's customers to optimize 
the application for their own specific computers. 

5) The invention only requires one implementation of 
compiler optimization code for both source code opti- 
mized compilation and binary code re-optimization. 

Although the present invention has been described in 
terms of the presently preferred embodiments, one skilled in 
the art will understand that various modifications and alter- 
ations may be made without departing from the scope of the 
invention. Accordingly, the scope of the invention is not to 
be limited to the particular invention embodiments discussed 40 
herein, but should be defined only by the appended claims 
and equivalents thereof. 

What is claimed is: 

1. A computer controlled method for converting a first 
binary executable that executes within a first application 45 
binary interface (ABI) environment to a second binary 
executable that executes within a second ABI environment, 
said method performed by a high level language compiler 
and comprising: 

converting said first binary executable to an intermediate 

representation; 
approximating the state of the compiler used to compile 
the first binary executable at least partially based on 
annotation information; 
removing from the intermediate representation an archi- 
tectural related optimization that depends on said first 
ABI environment based upon the approximating of the 
state of the compiler used to compile the first binary 
executable, wherein said removing removes register 
availability limitations from within said intermediate 
representation; 
optimizing said intermediate representation for said sec- 
ond ABI environment, at least partially based on an 
execution profile; and 
generating said second binary executable based on the 
optimizing of said intermediate representation. 
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2. The computer controlled method of claim 1 wherein 
step (b) further comprises steps of: 

(bl) detecting a pipeline scheduling artifact within said 

intermediate representation; and 
(b2) removing said pipeline scheduling artifact from said 

intermediate representation. 

3. The computer controlled method of claim 1 wherein 
said first ABI environment is substantially identical to said 
second ABI environment and step (c) further comprises: 

(cl) analyzing an execution profile resulting from execu- 
tion of said first executable binary executable; and 

(c2) optimizing said intermediate representation depen- 
dent on said execution profile. 

4. The computer controlled method of claim 1 wherein 
said first ABI environment is substantially identical to said 
second ABI environment and step (c) further comprises: 

(cl) analyzing an execution profile to determine memory- 
cache performance information; and 

(c2) optimizing said intermediate representation depen- 
dent on said memory-cache performance information 
to improve performance. 

5. The computer controlled method of claim 1 wherein 
said architectural related optimization is a first operating 
system service invocation and step (c) further comprises 
replacing said first operating system service invocation with 
a preferred operating system service invocation. 

6. The computer controlled method of claim 5 wherein 
said first operating system service invocation is an obsolete 
operating system service invocation. 

7. An apparatus having a central processing unit (CPU) 
and a memory coupled to said CPU for converting, by a high 
level language compiler, a first binary executable that 
executes within a first application binary interface (ABI) 
environment to a second binary executable that executes 
within a second ABI environment, said apparatus comprises: 

a binary disassembler mechanism configured to convert 
said first executable binary executable to an interme- 
diate representation; 

a compiler approximation mechanism configured to 
approximate the state of the compiler used to create 
said first binary executable, at least partially based on 
annotation information; 

an intermediate representation normalization mechanism 
configured to remove an architectural related optimi- 
zation from the first binary executable based on the 
approximation by the compiler approximation 
mechanism, wherein said intermediate representation 
normalization mechanism further comprises a register 
virtualization mechanism configured to remove register 
availability limitations from within said intermediate 
representation; 

an intermediate representation optimizer mechanism con- 
figured to optimize said intermediate representation for 
said second ABI environment, at least partially based 
on an execution profile; and 

a code generation mechanism configured to generate said 
second binary executable using said intermediate 
representation, said intermediate representation opti- 
mized for said second ABI environment. 

8. The apparatus of claim 7 wherein the intermediate 
representation normalization mechanism further comprises: 

a pipeline scheduling detection mechanism configured to 
detect a pipeline scheduling artifact within said inter- 
mediate representation; and 

a pipeline artifact removal mechanism configured to 
remove said pipeline scheduling artifact, detected by 
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the pipeline scheduling detection mechanism, from said 
intermediate representation. 

9. The apparatus of claim 7 wherein said first ABI 
environment is substantially identical to said second ABI 
environment and the intermediate representation optimizer 5 
mechanism further comprises: 

an execution analysis mechanism configured to analyze 
an execution profile resulting from execution of said 
first executable binary executable; and 

an execution-profile optimization mechanism configured 1C 
to optimize said intermediate representation dependent 
oq said execution profile analyzed by the execution 
analysis mechanism. 

10. The apparatus of claim 7 wherein said first ABI 
environment is substantially identical to said second ABI 
environment and has at least one memory cache mechanism 
and the intermediate representation optimizer mechanism 
further comprises: 

a cache analysis mechanism configured to analyze an 2Q 
execution profile to determine memory-cache perfor- 
mance information; and 

a cache optimization mechanism configured to optimize 
said intermediate representation dependent on said 
memory-cache performance information to improve 2 5 
performance of said at least one memory cache mecha- 
nism. 

11. The apparatus of claim 7 wherein said architectural 
related optimization is a first operating system service 
invocation and the intermediate representation optimizer 30 
mechanism further comprises an operating system invoca- 
tion replacement mechanism configured to replace said first 
operating system service invocation with a preferred oper- 
ating system service invocation. 

12. The apparatus of claim 11 wherein said first operating 35 
system service invocation is an obsolete operating system 
service invocation. 

13. A computer program product comprising: 

a computer usable storage medium having computer 
readable code embodied therein for causing a computer 40 
to convert, by a high level language compiler, a first 
binary executable that executes within a first applica- 
tion binary interface (ABI) environment to a second 
binary executable that executes within a second ABI 
environment, said computer readable code comprising: 45 

computer readable program code configured to cause said 
computer to effect a binary disassembler mechanism 
configured to convert said first binary executable to an 
intermediate representation; 

computer readable program code configured to cause said 
computer to approximate the state of the compiler used 
to compile said first binary executable, at least partially 
based on annotation information; 

computer readable program code configured to cause said 55 
computer to effect an intermediate representation nor- 
malization mechanism configured to remove an archi- 
tectural related optimization based on the approxima- 
tion of the state of the compiler used to compile the first 
binary executable; 60 

computer readable program code configured to cause said 
computer to effect an intermediate representation opti- 
mizer mechanism con6gured to optimize said interme- 
diate representation for said second ABI environment, 
at least partially based on an execution profile; $5 

computer readable program code configured to cause said 
computer to effect a code generation mechanism con- 
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figured to generate said second binary executable using 
said intermediate representation optimized for said 
second ABI environment; and 
wherein the intermediate representation normalization 
mechanism comprises computer readable program 
code configured to cause said computer to effect a 
register virtualization mechanism configured to remove 
register availability limitations from within said inter- 
mediate representation. 

14. The computer program product of claim 13 wherein 
the intermediate representation normalization mechanism 
further comprises: 

computer readable program code configured to cause said 
computer to effect a pipeline scheduling detection 
mechanism configured to detect a pipeline scheduling 
artifact within said intermediate representation; and 

computer readable program code configured to cause said 
computer to effect a pipeline artifact removal mecha- 
nism configured to remove said pipeline scheduling 
artifact, detected by the pipeline scheduling detection 
mechanism, from said intermediate representation. 

15. The computer program product of claim 13 wherein 
said first ABI environment is substantially identical to said 
second ABI environment and the intermediate representa- 
tion optimizer mechanism further comprises: 

computer readable program code configured to cause said 
computer to effect an execution analysis mechanism 
configured to analyze an execution profile resulting 
from execution of said first executable binary execut- 
able; and 

computer readable program code configured to cause said 
computer to effect an execution-profile optimization 
mechanism configured to optimize said intermediate 
representation dependent on said execution profile ana- 
lyzed by the execution analysis mechanism. 

16. The computer program product of claim 13 wherein 
said first ABI environment is substantially identical to said 
second ABI environment and has at least one memory cache 
mechanism and the intermediate representation optimizer 
mechanism further comprises: 

computer readable program code configured to cause said 
computer to effect a cache analysis mechanism config- 
ured to analyze an execution profile to determine 
memory-cache performance information; and 

computer readable program code configured to cause said 
computer to effect a cache optimization mechanism 
configured to optimize said intermediate representation 
dependent on said memory-cache performance infor- 
mation to improve performance of said at least one 
memory cache mechanism. 

17. The computer program product of claim 13 wherein 
said architectural related optimization is a first operating 
system service invocation and the intermediate representa- 
tion optimizer mechanism further comprises computer read- 
able program code configured to cause said computer to 
effect an operating system invocation replacement mecha- 
nism configured to replace said first operating system ser- 
vice invocation with a preferred operating system service 
invocation. 

18. The computer program product of claim 17 wherein 
said first operating system service invocation is an obsolete 
operating system service invocation. 
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